Skip to content

Conversation

@r-vasquez
Copy link
Contributor

Fixes UX-97

53a2ac8 Fixes the tag to v2.8.5 while we complete UX-31 and to prevent shipping a missing configuration once Console releases the upcoming version.

The rest of the commit addresses UX-97 and improves the error message in failure scenarios, examples below:

Stranded container scenario:

# Before
$ rpk container start
Waiting for the cluster to be ready...

unable to start cluster: config erroneously has no seed brokers

Errors reported from the Docker container:

# Now:
$ rpk container start 
unable to start cluster: stranded Console container detected; please run 'rpk container purge' and try again

Error in one of the containers:

$ rpk container start -n 5
Checking for a local image...
Creating network "redpanda"
Starting cluster...
Waiting for the cluster to be ready...
unable to start cluster: expected 5 nodes, got 3

Errors reported from the Docker container with ID 1e1929f478ae:

+ '[' '' = true ']'
+ exec /usr/bin/rpk redpanda start --node-id 2 --kafka-addr internal://0.0.0.0:9092,external://172.24.1.4:19092 --advertise-kafka-addr internal://172.24.1.4:9092,external://127.0.0.1:11092 --pandaproxy-addr internal://0.0.0.0:8082,external://172.24.1.4:18082 --advertise-pandaproxy-addr internal://172.24.1.4:8082,external://127.0.0.1:10082 --schema-registry-addr 172.24.1.4:8081 --rpc-addr 172.24.1.4:33145 --advertise-rpc-addr 172.24.1.4:33145 --mode dev-container --seeds 172.24.1.2:33145
WARNING: This is a setup for development purposes only; in this mode your clusters may run unrealistically fast and data can be corrupted any time your computer shuts down uncleanly.
WARN  2025-03-29 00:26:55,197 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:65536 available:0 requested:22052
Could not initialize seastar: std::runtime_error (Could not setup Async I/O: Not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)

^ In the past, we used to print only the logs of the first container and not the one that failed.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

We are going to release a new v3 console version
that contains configuration changes.

Ideally, the console tag should be passed at build
time, but in the meantime, we will fix the tag to
v2.8.5 to avoid automatically upgrading to v3 in
rpk without the config changes.
For stranded console containers.

This could happen if the user manually stops
and remove the Redpanda broker containers and
leave the Console container up.

This provides a better error message than just
an empty log from Docker.
Instead of printing the logs of the first container
by default.
@r-vasquez r-vasquez requested review from a team and kbatuigas as code owners March 29, 2025 00:53
@r-vasquez r-vasquez requested review from malinskibeniamin and removed request for a team March 29, 2025 00:53
@r-vasquez r-vasquez requested a review from weeco March 29, 2025 01:33
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Mar 29, 2025

CI test results

test results on build#63910
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/63910#0195dfaa-11f2-4b94-b266-ad4e17f493bb FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/63910#0195dfbe-350a-4a4b-9f27-04d894d7d38c FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/63910#0195dfbe-350a-4de1-826e-74d6ea249ac4 FLAKY 1/2
rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=-1.restart=True.controller_snapshots=False ducktape https://buildkite.com/redpanda/redpanda/builds/63910#0195dfbe-3509-4283-b8c4-5ce388ced878 FLAKY 1/2
test results on build#63986
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/63986#0195ed7e-e7ce-4d2b-b4d4-4e486fb147c3 FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/63986#0195ed7b-ae08-4922-b339-91cf3fbb4425 FLAKY 1/3
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/63986#0195ed7e-e7ce-4d2b-b4d4-4e486fb147c3 FLAKY 1/2

weeco
weeco previously approved these changes Mar 29, 2025
A typical example would be:
 user tries to start n=3 cluster.
 node n-1 fails
 console didn't stat

Then, the user tries to start again, rpk will
trigger the restart path if the user didn't purge
the previous cluster.

This prevents a panic as at this point the console
container state will be nil
@r-vasquez
Copy link
Contributor Author

@weeco Sorry, I pushed dcd2220 which was missing in my original push 😅.

@r-vasquez r-vasquez requested a review from weeco March 31, 2025 21:59
@r-vasquez r-vasquez merged commit b963b4a into redpanda-data:dev Apr 1, 2025
25 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v25.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants